Skip to content

Conversation

@Pascal-So
Copy link

Description of changes

fixes #5241.

As discussed there, we align the types of the provided embedding functions, such that they all implement EmbeddingFunction[Embeddable]. This was the second option among my proposed solutions.

Test plan

Both code snippets given in the issue now type-check with mypy in strict mode, whereas previously only one of them passed.

I'll see what github actions says about all the existing tests before fiddling around with installing k8s / tilt / whatever else is needed to run those.

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

This is technically a breaking change for anyone who wrote something along those lines:

class MyCustomEmbeddingFunction(EmbeddingFunction[Documents])
    ....

client.get_or_create_collection("asdf", configuration={"embedding_function": MyCustomEmbeddingFunction()})

Any users of the built-in embedding functions won't notice anything.

If you consider this to be an issue, then I guess we'll have to re-add some # type: ignore comments which sort of defeats the whole point of this PR. Let me know what you think.

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

No documentation changes needed, since as far as I can tell the docs on docs.trychroma.com don't show any type annotations anyway.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Pascal-So Pascal-So marked this pull request as ready for review November 14, 2025 23:12
@propel-code-bot
Copy link
Contributor

Standardise EmbeddingFunction generic to Embeddable across the codebase

This pull request replaces every remaining use of EmbeddingFunction[Documents] with the broader EmbeddingFunction[Embeddable]. The change eliminates lingering mypy-strict errors, unifies Python, JS and Rust expectations, and ensures that any string-like payload accepted by the Embeddable alias (single string, list of strings, etc.) can be passed to any built-in or custom embedding implementation without type ignores. No runtime behaviour changes; only static-typing signatures are updated.

The modification touches >20 embedding modules and the public client/collection APIs that accept an EmbeddingFunction. Down-stream users who explicitly parameterised their own classes or type aliases with Documents will need to migrate to Embeddable, but existing code that merely uses the functions at runtime will continue to work unchanged.

Key Changes

• All embedding classes (OpenAI, HuggingFace, Mistral, Jina, Instructor, etc.) now subclass EmbeddingFunction[Embeddable] instead of EmbeddingFunction[Documents]
• Public API surfaces (client, async_client, collections, collection_configuration) updated to accept EmbeddingFunction[Embeddable]
• Updated test suites and utility re-exports to align with the new generic
• Removed mypy type-ignore work-arounds previously required when mixing embedding implementations

Affected Areas

• chromadb/embeddings/*
• chromadb/api/client.py, async_client.py
• chromadb/models/CollectionCommon.*
• chromadb/utils/init.py
• tests that referenced Documents

This summary was automatically generated by @propel-code-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: text-only embedding functions can't be used typesafely

1 participant